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The present application claims priority of U.S. Provisional Patent Application 
SOTalNo. 60/390,933, ffled June 24, 2002, the disclosure of which being incorporated by 
reference herein in its entirely. 



A, Field of the Invention 

The present invention relates generally to hardware accelerators, and, more 
particularly to a hardware implementation of the pseudo-spectral time-domain (PSTD) 
method. 

B. Description of the Related Art 

Since the advent of the modem computer, a great deal of effort has gone into the 
development of numerical algorithms for the rigorous solution of electromagnetic 
problems. Today, popular nimierical approaches include the finite-element method 
(FEM), method of moments (MOMT), modal expansion techniques, boundary integral 
methods, and time-domain methods such as the jBnite-volume time-domain (FVTD), 
multi-resolution time-domain (MRTD), and the finite-difference time-domain (FDTD) 
methods. Each of these algorithms possesses clear advantages and disadvantages 
depending on the specific application. One algorithm in particular, the pseudo-spectral 
time-domain (PSTD) method shows particular promise. In comparison to the above 
methods, the PSTD technique can require far less memory while maintaining, and in fact 
improving, the accuracy and versatility of electromagnetic analysis. To this ^d, recent 
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numerical exp^jments have confinned fhat, for a fixed amount of computational 
resources, the PSTD method can analyze problems two to three orders of magnitude 
larger than an FDTD method with the same level of accuracy. 

However, one of the difficulties of the PSTD method is the large number of 
5 forward and inverse Fast Fourier Transforms (FFTs and IFFTs) that need to be computed, 
which significantly slow down the analysis. In comparison to the FDTD method, which 
has order computational dependence, the PSTD method has order MogAT. Thus, from a 
pure software point of view, the PSTD method is far less appealing from a computational 
resources point of view. 

10 Over the last several decades, significant effort has been put into realizing 

application-specific integrated circuits (ASICs) for application to digital signal 
processing (DSP). As a result, ASICs are currently available that perform the FFT and 
IFFT operations in fractions of a microsecond. Thus, the PSTD method is far more 
attractive to implement in hardware than the FDTD method, due to the wealth of 

15 technology that it can leverage. While the FDTD method can also be realized in 
hardware, it suffers from the fact fhat it requires extraordinary amounts of computer 
memory. In comparison, the PSTD method can analyze problems two to three orders of 
magnitude larger than other full-wave techniques, the FDTD method in particular, with 
the same level of accuracy. 

20 Despite the advantages of the FDTD acceleration hardware over software-based 

implementations, the FDTD metiiod requires a tremendous amount of memory. This can 
greatly limit the size of the problems that are capable of being solved. Because the PSTD 
method requires fewer samples than the FDTD method (on the order of 1000 times 
fewer), much larger problems can be solved with the PSTD method given the same 
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amount of memoiy . 

Although PSTD methods are accurate and well defined, current computer system 
technology limits the speed at which we can perform these operations. To use the PSTD 
method to solve a non-trivial problem can take hours, days, weeks, months, etc. Some 
5 problems axe even too large to be effectively solved due to time constramts. 

Thus, there is a need in the art to overcome the limitations of the related art, and 
to provide for a practical, hardware implementation of the PSTD method. 



10 The present invention solves Ihe problems of the related art by providmg a 

hardware-based PSTD processor that capitalizes on the large advancements in the area of 
appUcation DSP ASICs. By combining the PSTD algorithm with modem DSP chips and 
large scale FPGAs, the PSTD processor will be capable of solving very large radiation 
problems in computation times short enough to enable iterative design. Given this 
15 potential, a hardware implementation of the PSTD method offers the ability to design an 
extraordinary array of electromagnetic problems that heretofore have been impossible. 

Further scope of applicability of the present invention will become apparent from 
the detailed description given hereinafter. However, it should be understood that the 
detailed description and specific examples, while indicating prefenred embodiments of the 
20 invention, are given by way of illustration only, since various changes and modifications 
within the spkit and scope of ftte mvention will become apparent to those skilled in the art 
from this detailed description. It is to be understood that both the foregoing gcaaeral 
description and the following detailed description are exemplary and explanatory only 
and are not restrictive of the invention, as claimed. 
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BRIEF DESCBIFnON OF THE DRAWING 

The present invention will become mote fully undostood ftom the detailed 
description given hCTeinbelow and Ihe acconq>anying drawing which are given by way of 
illustration only, and tiius are not limitative of the present invention, and wherein: 
5 Fig. 1 is a schematic diagram showing a hardware implementation of tiie PSTD 

method in accordance with an embodiment of the present invention. 

nir.TATT.V.n DESCRIPTION OF EMBODIMENTS O F THE INVENTION 
The following detailed description of the invention refers to Ihe accompanying 
10 drawings. The same reference numbers in different drawings identify the same or smailar 
elements. Also, the following detaQed description does not limit the invention. Instead, 
the scope of the invention is defined by the appended claims and equivalents thereof 
Equations in the PSTD method take Ihe following form: 



E^-AE^+B^+CE'^ (1) 



15 where a, ft, and c are directions (x, y, or z), A, 5, and C are coefficients based on the 
material properties oftiie medium, and E"^ is tiie incident field associated with tiie node. 

Unlike the finite-differraice time-domain (FDTD) method, where both spatial and 
temporal derivatives are represented by finite differences, Ihe PSTD method uses FFTs 
and IFFTs to calculate the spatial derivatives. For example, recall from basic frequency- 

20 domain calculations tiiat tiie spatial derivative of a field, H^, in the ^/-direction can be 
written as: 

^ = IFFTO*k*FFT(H/i.:.k)) (2) 
dy 
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To compute the spatial draivative ofHg in Hie y-direclion for some node (i,J. k), 
all of the values ofH^ in they-direction along die line ii,...,k) are required. Although this 
can require a tremendous amount of data to be fetched (depending on the mesh size), this 
operation only needs to be performed once for all values thatrequire this dcdvative along 
5 this line. If there are iV^ values in the>'-direction, by fetching i\r values firom memory, the 
spatial derivative in the ^/-direction for N different mesh points may be computed. 
Although there is a significant latency associated with the fetch operation, this operation 
only needs to be performed once per derivative, per timestep. 

Despite the large amount of data required to perform the FFT/EFFT operations, it 
10 is possible to efiSciently organize RAM to maximize throughput. For example, if the 
values of Hz were stored in RAM in increasing-y order, it would be possible to perform a 
burst-read from RAM. Burstmg allows the RAM to fetoh contiguous locations very 
rapidly. Thus, burst reads allow fetching of all of the values necessary to compute the 
derivative, while, at the same time, maximizing the throughput of the RAM. Because 
15 spatial derivatives are required in &e jc, y, and z-directions, values would need to be 
stored in the RAM in three differentpattrans (iacreasmg-x, increasmg-;;, and increasing- 
z). This would permit taking advantage of the bursting capabilities when computing any 
required derivative. Similarly, fliere will be three different *Hipdate" orders. Theupdate 
order specifies the order in which nodes in the mesh will be updated. Updating nodes in 
20 die same order in-which they are stored in RAM allows reuse of the spatial derivatives. 
Once the spatial derivative of a field is known in a specific direction, that value can be 
used to update any node along that line. Thus, by solving fields along that line, the same 
value canbe used over and over again without incurring the latency of die RAM fetching 
and the FFT/lFFl' operations. 
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Fig. 1 shows one logical data flow for a PSTD accelerator 10 in accordance with 
the present invention. Note that there are tihxee parallel datapaths 12, 14, 16. These 
datapaths 12, 14, 16 are completely faidependent of one another (except for a memory 
subsystem 18) and each solves fields in a given direction (x, or z). Each of these 
5 datapaths 12, 14, 16 is responsible for solving an equation ofthe form shown in equation 

1. 

The flow within each computational path begins by determining the spatial 
derivative needed for the computation. The necessary values must be fetched from the 
memory subsystem 18 and streamed into an FFT unit 20 that performs the FFT on the 
10 values. Depending on the size ofthe problem being solved and the capabilities ofthe 
FFT unit 20, this computation can easily require several thousand cycles. While the FFT 
is being computed by the FFT unit 20, the other necessary data, mcluding the primary 
fields, incident fields, and coefficients, can all be fetched from memory subsystem 18 or 
computed. This entire computational path v/ill be pipelined to hide much of Ihe latency 
15 of the FFT operation. As results begin emerging from Ihe FFT unit 20, the results are 
streamed through a pipelined a complex multiplication unit 22 that solves the 
multipUcation aspects of equation (2) set forth above. At this point, the spatial 
derivatives will have been computed in the frequency domain. The next step in each 
datapath 12, 14, 16 is to convert the frequency domain result back mto the time domain 
20 by means of an IFFT unit 24 that performs an IFFT. IFFT unit 24 will also undergo a 
seveml thousand-cycle latency. As results begin emerging from the IFFT unit 24, the 
results are streamed into a Computation Engine (CE) 26. The CE26, given the necessary 
data and the spatial derivative, solves equation (1) set forth above. Once complete, the 
fields are written back to Ihe memory subsystem 1 8 by CE 26, 
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It will be appar^t to fhose skilled in the art that various modifications and 
variations can be made in the hardware implementation of the PSTD method of the 
present invention and in construction of the hardware without departing from the scope or 
spirit of the invention. 

5 For example, although Fig. 1 shows three parallel computational datapaths 12, 14, 

1 6, more or less such datapaths may be provided. Fewer datapaths may be implemented 
to save on hardware, or extra datapaths may be included to increase parallelism. The 
actual number of datapaths is mostly dependent upon the capabilities of the memory 
subsystem 18. 

10 Also, the EFT and IFFT units 20, 24 may or may not be co-located wilh complex 

multiplication unit 22 and CE unit 26. FFT and IFFT units 20, 24 may be implemented 
inside of a field-programmable gate-array (FPGA). However, custom FFT/IFFT chips 
may be purchased or DSP chips may be progranuned to solve the FFT/IFFT equations. 
The latter configurations require DSP chips external to the FPGA. This would result in a 

15 more complex printed circuit board and latency in transferrmg data into and out of the 
FPGA. However, DSP chips are extremely fast and would save FPGA resources. 
However, with very advanced FPGAs aheady in the market, an implementation where 
the FFT/IFFT operations are performed inside the FPGA is quite possible. 

By implementing the PSTD method in hardware, computational speedup is 

20 achieved that allows solving problems much faster than current software-based methods 
and also allows solving problems that were heretofore unsolvable due to time constraints. 
The hardware in5)lementation of the PSTD method of the present invention may be used 
in any field or application utilizing a software method based on a pseudo-spectral time- 
domain approach. 
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Other embodimmts of fhe invention will be apparent to those skilled in the art 
jBrom consideration of the specification and practice of the invention disclosed herein. It 
is intended that the specification and examples be considered as exemplary only, with a 
true scope and spirit of the invention being indicated by the following claims. 

5 
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WHAT IS CLAIMED IS : 

1 1 . A system for peiforming the pseudo-spectral time-domain (PSTD) method 

2 on data, comprising: 

3 a forward fast Fourier trajosform (FFT) unit calculating a forward fast 

4 Fourier transform (FFT) from the data; 

5 a complex multiplication unit receiving Hie FFT-processed data and 

6 calculating a spatial derivative in the frequency domain from the FFT-processed data; 

7 an inverse fast Fourier transform (IFFT) unit converting the spatial 

8 derivative in the frequency domain from the conoplex multiplication unit into the time 

9 domain; and 

10 a computation engine solving a PSTD equation based upon the spatial 

1 1 derivative in the time domain received from the IFFT unit 

I 2. A system as recited in claim 1 , wherein the PSTD equation takes the form: 



3 where a, 6, and c are directions (x, y» and z). A, By and C are coefficients based on 

4 material properties of a medium, and E^^ is the incident field associated with the node. 

1 3. A system as recited in claim 1, wherein as the FFT is being calculated, 

2 primary fields, incident fields, and coefficients are being fetched by the system. 

1 4. A system as recited in claim 1, wherein the FFT and IFFT units are 

2 provided inside a field-programmable gate array (FPGA). 
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1 5. A system as recited in claim 4, wherein the FFT and lEFT calculations are 

2 performed by a digital signal processing (DSP) chip. 

1 6. A system for performing the pseudo-spectral time-domain (PSTD) method 

2 on data, comprising: 

3 a plurality of forward fast Fourier transform (FFT) units, each FFT unit 

4 calculating a forward fast Fourier transform (FFT) from the data; 

5 a pliurality of complex multiplication \mits, each complex multiplication 

6 unit receiving the FFT-processed data from a coiresponding FFT unit and calculating a 

7 spatial derivative in the frequency domain from the FFT-processed data; 

8 a plurality of inv^se fast Fourier transform (IFFT) units, each EFFT imit 

9 converting the spatial derivative in the frequency domain from a corresponding complex 

10 multiplication unit into the time domain; and 

11 a plurality of computation engines, each computation engine solving a 

12 PSTD equation based upon the spatial derivative in the time domain received from a 

13 corresponding IFFT imit 

1 7. A system as recited in claim 6, wherein the PSTD equation takes the form: 

3 where a, fe, and c are directions (x, y, and z). A, 5, and C are coefficients based on 

4 material properties of a medium, and is flie incident field associated with the node. 
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1 8. A system as recited in claim 6, wherein as the EFT is being calculated, 

2 primaiy fields, incident fields, and coefficients are being fetched by the system. 

1 9. A system as recited in claim 6, wherein the plurality of FFT and IFFT 

2 units are provided inside a field-programmable gate array (FPGA). 

1 10. A system as recited in claim 9, wherein the FFT and IFFT calculations are 

2 performed by a digital signal processing (DSP) chip. 

1 11 . A computer hardware configuration for performing the pseudo-spectral 

2 time-domain (PSTD) method on data, comprising: 

3 a forward fast Fourier transform (FFT) unit calculating a forward fast 

4 Fourier transform (FFT) firom the data; 

5 a complex multiplication unit receiving the FFT-processed data and 

6 calculating a spatial derivative in the frequency domain fi:om the FFT-processed data; 

7 an inverse fast Fourier transform (IFFT) unit converting the spatial 

8 derivative in the firequency domain firom the complex multiplication unit into the time 

9 domain; and 

10 a computation engine solving a PSTD equation based upon the spatial 

1 1 derivative in the time domain received firom the IFFT unit 

1 12. A computer hardware configuration as recited in claim 11, wherein the 

2 PSTD equation takes the form: 
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4 where a, 6, and c are diiections (x, y, and z), ^, 5, and C are coefiScients based on 

5 material properties of a medium, and Ej^ is the incident field associated with the node. 

1 13. A computer hardware configuration as recited in claim 1 1, wherein as the 

2 FFT is being calculated, primary fields, incident fields, and coefficients are being fetched 

3 by the system. 

1 14. A computer hardware configuration as recited in claim 1 1, wherein the 

2 FFT and IFFT traits are provided inside a field-programmable gate array (FPGA). 

1 15. A computer hardware configuration as recited in claim 14, wherein the 

2 FFT and IFFT calculations are performed by a digital signal processing (DSP) chip. 

1 1 6. A computer hardware configumtion for performing the pseudo-spectral 

2 time-domain (PSTD) method on data, comprising: 

3 a plurality of forward fast Fourier transform (FFT) units, each FFT unit 

4 calculating a forward fest Fourier transform (FFT) firom the data; 

5 a plurality of complex multiplication units, each complex multiplication 

6 unit receiving die FFT-processed data from a corresponding FFT unit and calculating a 

7 spatial derivative in the frequency domain from the FFT-processed data; 

8 a plurality of inverse fast Fourier transform (IFFT) units, each IFFT unit 

9 converting the spatial derivative in the frequency domain from a corresponding complex 
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10 multiplication unit into the time domain; and 

11 a plurality of computation engines, each computation engine solving a 

12 PSTD equation based upon the spatial derivative in the time domain received from a 

13 corresponding IFFT unit. 



1 17. A computer hardware configuration as recited in claim 16, wherein the 

2 PSTD equation takes the form: 

4 where a, 6, and c are directions (x, y, and z), A, B, and C are coefficients based on 

5 material properties of a medium, and E^ is the incident field associated with the node. 

1 1 8 . A computer hardware configuration as recited in claim 1 6, wherein as the 

2 FFT is being calculated, primary fields, incident fields, and coeflBcients are being fetched 

3 by the system. 

1 19. A computer hardware configuration as recited in claim 16, wherein the 

2 plurality of FFT and IFFT units are provided inside a field-programmable gate array 

3 (FPGA). 

1 20, A computer hardware configuration as recited in claim 19, wherein the 

2 FFT and IFFT calculations are performed by a digital signal processing (DSP) chip. 
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